Picture for Csaba Szepesvári

Csaba Szepesvári

Efficient Simple Regret Algorithms for Stochastic Contextual Bandits

Add code
Jan 29, 2026
Viaarxiv icon

Eluder dimension: localise it!

Add code
Jan 14, 2026
Viaarxiv icon

Frontier LLMs Still Struggle with Simple Reasoning Tasks

Add code
Jul 09, 2025
Figure 1 for Frontier LLMs Still Struggle with Simple Reasoning Tasks
Figure 2 for Frontier LLMs Still Struggle with Simple Reasoning Tasks
Figure 3 for Frontier LLMs Still Struggle with Simple Reasoning Tasks
Figure 4 for Frontier LLMs Still Struggle with Simple Reasoning Tasks
Viaarxiv icon

Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits

Add code
Oct 01, 2024
Viaarxiv icon

Confident Natural Policy Gradient for Local Planning in $q_π$-realizable Constrained MDPs

Add code
Jun 26, 2024
Viaarxiv icon

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^π$-Realizability and Concentrability

Add code
May 27, 2024
Viaarxiv icon

Regret Minimization via Saddle Point Optimization

Add code
Mar 15, 2024
Viaarxiv icon

Switching the Loss Reduces the Cost in Batch Reinforcement Learning

Add code
Mar 12, 2024
Figure 1 for Switching the Loss Reduces the Cost in Batch Reinforcement Learning
Figure 2 for Switching the Loss Reduces the Cost in Batch Reinforcement Learning
Figure 3 for Switching the Loss Reduces the Cost in Batch Reinforcement Learning
Figure 4 for Switching the Loss Reduces the Cost in Batch Reinforcement Learning
Viaarxiv icon

Ensemble sampling for linear bandits: small ensembles suffice

Add code
Nov 14, 2023
Viaarxiv icon

Exploration via linearly perturbed loss minimisation

Add code
Nov 13, 2023
Figure 1 for Exploration via linearly perturbed loss minimisation
Figure 2 for Exploration via linearly perturbed loss minimisation
Viaarxiv icon